منابع مشابه
Transliteration and alignment of parallel texts from Cyrillic to Latin
This article describes a methodology of recovering and preservation of old Romanian texts and problems related to their recognition. Our focus is to create a gold corpus for Romanian language (the novella Sania), for both alphabets used in Transnistria – Cyrillic and Latin. The resource is available for similar researches. This technology is based on transliteration and semiautomatic alignment ...
متن کاملAutomatic Transliteration of Judeo-Arabic Texts into Arabic Script
! The Judeo-Arabic languages comprise a set of dialects spoken and written by Jewish communities living in Arab countries, mainly during the middle ages. Judeo-Arabic is typically written in Hebrew letters, enriched with various diacritic marks. The Judeo-Arabic spoken and written by any particular Jewish community is similar to the Arabic dialect used by their local Muslim community. In additi...
متن کاملA Transliteration based Word Segmentation System for Shahmukhi Script
Word Segmentation is an important prerequisite for almost all Natural Language Processing (NLP) applications. Since word is a fundamental unit of any language, almost every NLP system first needs to segment input text into a sequence of words before further processing. In this paper, Shahmukhi word segmentation has been discussed in detail. The presented word segmentation module is part of Shah...
متن کاملSangam: A Perso-Arabic to Indic Script Machine Transliteration Model
Indian sub-continent is one of those unique parts of the world where single languages are written in different scripts. This is the case for example with Punjabi, written in Indian East Punjab in Gurmukhi script (a Left to Right script based on Devnagri) and in Pakistani West Punjab, it is written in Shahmukhi (a Right to Left script based on Perso-Arabic). This is also the case with other lang...
متن کاملTransliteration of Arabizi into Arabic Orthography: Developing a Parallel Annotated Arabizi-Arabic Script SMS/Chat Corpus
This paper describes the process of creating a novel resource, a parallel Arabizi-Arabic script corpus of SMS/Chat data. The language used in social media expresses many differences from other written genres: its vocabulary is informal with intentional deviations from standard orthography such as repeated letters for emphasis; typos and nonstandard abbreviations are common; and nonlinguistic co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Nature
سال: 1953
ISSN: 0028-0836,1476-4687
DOI: 10.1038/171940b0